C/C++ Users Group Library 1996 July

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Group Library 1996 July / C-C++ Users Group Library July 1996.iso / vol_300 / 370_01 / gat_ref.asc < prev next >

Wrap

Text File | 1992-10-27 | 118KB | 3,265 lines

GA Toolkit Version 1.0 Sara A. Lienau University of Missouri-Kansas City May 20, 1992 TABLE OF CONTENTS {toc \f|1. INTRODUCTION 1 Overview of Genetic Algorithms 1 The GA Toolkit 3 Overview of Documentation 4 2. INSTALLATION 5 3. USER INTERFACE PROGRAM 8 Selecting GA Components 8 Description of Available Components 9 Selecting the Problem to Solve 12 Description of Available Problems 12 Creating an Executable GA Program 14 Creating Input Files 15 Creating the "in" File 16 Description of Input Parameters 17 Creating the "genes" File 20 Defining a New Problem 24 4. EXECUTING YOUR GA PROGRAM 29 Description of Output Files 30 5. PROGRAMMER'S REFERENCE 33 Description of Classes 33 Description of #define Directives 36 Adding a New GA Component 38 Adding a New Parameter 50 REFERENCES 56 } CHAPTER 1 {tc "1. INTRODUCTION"}Introduction This document describes a software tool for researching and applying Genetic Algorithms (GAs), called the GA Toolkit. GAs are powerful search and optimization techniques based on the principles of natural selection and population genetics. This chapter provides a brief overview of Genetic Algorithms, describes the GA Toolkit and present an overview of the remainder of this document. {tc "Overview of Genetic Algorithms" \l 2}Overview of Genetic Algorithms GAs manipulate a population of sample points from the search space. Each sample point or individual in the population has several components--chromosome, phenotype and fitness value. The chromosome is a coded representation for a particular point in the search space. In biological systems, the chromosomes contains the coded instructions for creating all the features of an organism. Similarly, in GAs, the chromosome contains a coded representation for the various features or parameters for the problem the GA is attempting to optimize. A phenotype is the expression of the genetic material. For example, a gene on a human chromosome may contain the code for blue eyes and the human with blue eyes is the expression of this genetic material or the phenotype. In GAs, the phenotype corresponds to decoding the chromosome structure into components or parameters that are meaningful in terms of the problem the GA is attempting to solve. The GA treats the optimization problem as a black box--it applies a set up inputs to the problem and measures the overall performance (called the fitness of the solution) of the set of inputs. The GA does not require knowledge of nor does it understand the relationships that may exist between various parameters of the problem. The GA uses the fitness value of the sample points and the similarities in their coded representation (the chromosome) to search for better points in the search space. This concept comes from viewing the evolution of species as an optimization process. Darwin's "survival of the fittest" principle (also called natural selection) suggested that species become better adapted to their environment because only the "fittest" individuals are able to survive and reproduce passing their genetic code on to subsequent generations. The "fittest" individuals are those with a some type of beneficial features or characteristics that enables the creature to survive and reproduce in its environment. As individuals with various beneficial characteristics mate subsequent generations are more likely to contain individuals with combinations of advantageous features that were possessed by their parents. And over time the population as a whole becomes better suited for its environment by preserving and propagating the beneficial characteristics. GAs use the principle of natural selection to mate the better individuals in the population with a higher frequency. The mating process, called crossover, exchanges segments of the parents' coded chromosome to create new offspring. This process allows the GA the combine features of good solutions in the hope of developing better solutions to the problem. The following describes the basic execution cycle of a Genetic Algorithm. Initialization Evaluation repeat Selection Recombination Evaluation Replacement until ( termination condition satisfied ) The Initialization phase randomly creates an initial population. At first the GA has no information about the search space, so randomly sampling a number of points from the search space is a good place to start the search process. In the Evaluation phase, each chromosome is decoded and a fitness value is assigned using the fitness function defined for the problem. In the Selection phase, individuals in the population are selected for mating based on their fitness with more fit individuals having a greater probability of being selected. During the Recombination phase, the individuals selected in the previous phase mate via a crossover procedure and exchange segments of their coded chromosome to create new offspring. In addition, during this phase, the new offspring or the parents may undergo mutation that slightly modifies their genetic code. Then the new offspring and any parents that were modified by mutation are re-evaluated as described above. And finally, the Replacement phase determines which offspring and members of the current population will survive into the next generation. This cycle repeats until a termination condition has been satisfied, either a sufficiently fit population member or a maximum number of reproductive trials. For a more detailed description of Genetic Algorithms see Goldberg (1989) or Holland (1975). {tc "The GA Toolkit" \L 2}The GA Toolkit The description of GAs above presents a general framework for these algorithms. There are actually various implementation for each phase, that have been proposed by the leading researchers in the field, each with varying degrees of effectiveness on different types of problems. With the variety of techniques available, it is difficult to package together a single GA configuration that is best suited for all problem domains. The GA Toolkit is a software tool designed to provide a better alternative for researching and applying Genetic Algorithms. The GA Toolkit provides several techniques for each phase and allows the user to select which component to use. The GA Toolkit will actually create an executable GA program with the components that the user selects. This system allows the user to create a variety of GA programs for solving different types of problems. In addition, the Toolkit was designed so that new GA components could easily be added to the system as better techniques are developed. This tool offers many benefits to a variety of users. For those users that want to apply GAs, the Toolkit provides a collection of the most successful techniques available. These users can gain the benefits of using GAs without detailed knowledge of the underlying theory and without developing their own program code. For GA researchers, this tool greatly increases the ability to explore the effectiveness of different techniques in a variety of problem domains. In addition, the GA Toolkit allows the researcher to experiment with a particular technique in a number of different contexts. For example, the researcher could observer the performance of a selection technique using different crossover and replacement methods and develop a better evaluation of the selection procedure. This ability to isolate the effectiveness of a particular method is essential for developing useful guidelines on which components should be used in different problem domains. {tc "Overview of Documentation" \L 2}Overview of Documentation The remainder of this document describes how use the GA Toolkit. Chapter 2 describes the installation process. Chapter 3 discusses the user interface program that allows the user to select the desired components and create an executable GA program. This chapter also describes the GA components and the set of problems that are currently implemented in the GA Toolkit. Chapter 4 discusses how to execute the GA programs you create with the Toolkit and describes the output files produced by a GA program. And finally, Chapter 5 provides a reference for programmers who which to modify the system. This chapter describes the classes used to implement the GA techniques and discusses how to add new GA components to the Toolkit. Chapter 2 {tc "2. INSTALLATION"}Installation To install the GA Toolkit you need two files: install and gat.tar. Currently these files are located in the /home/slienau/Toolkit directory on the Sun SPARC workstations. If this directory does not exist when want to install the GA Toolkit, check with Dr. Graham about the new location of these files. Once you have found the necessary files you need to copy them to your home directory. The install file is a Bourne shell script that will create a main directory for the GA Toolkit, un-archive the files in the gat.tar file and create the executable user interface program (gatuif). Use the following command to execute the installation program. % install [directory name] The optional parameter allows you to specify the name of the directory where you want the GA Toolkit installed. If this parameter is omitted the Toolkit will be installed a directory called GATool. During the installation process several messages will be displayed about the files that are being un-archived. When the installation is complete a message will be displayed on your terminal. This message will inform you that you need to add the information in the cshrc.gat file (located in the Toolkit's main directory) to your .cshrc file in your home directory. This file contains several environment variables that are necessary for the Toolkit to operate correctly. The following shows the contents of this file. Note: In your file, the GATOOL environment variable may have a different directory name. # # GA Toolkit Env Variables # setenv GATOOL $home/GATool setenv GATSRC $GATOOL/Source setenv GATFNS $GATSRC/Funcs setenv GATIN $GATOOL/Input setenv GATHLP $GATOOL/Help setenv GATUIF $GATOOL/Uif setenv GATBIN $GATOOL/SH_BIN set path = ($GATBIN $GATUIF $path) The installation program creates several sub-directories under the directory where the GA Toolkit was installed. The following briefly describes the contents of each directory. Source contains the source code for all the GA components. Source/Funcs contains the source code for the test problems provided with the Toolkit. Input contains input files for the test problems in the Source/Funcs directory. Help contains the help files for the user interface program. Uif contains the source code for the user interface program. SH_BIN contains several Bourne shell programs used by the user interface program. The library of components in the GA Toolkit were written in the C++ language. In order to create executable GA programs you must be able to execute the C++ compiler. If you do not have the proper environment variable set up in your .cshrc file for the C++ compiler, then you will also need to add this information. The information you, currently, need to include in your .cshrc file for the C++ compiler to work properly is shown below. # # C++ Env Variables (sun C++) # setenv MANPATH /usr/share/man:/usr/local/man:/usr/man setenv CCROOTDIR /usr/local/lang/SC1.0 setenv CCLIBDIR $CCROOTDIR setenv I $CCROOTDIR/include/CC setenv LIBRARY $CCROOTDIR/libC.a setenv cfrontC $CCROOTDIR/cfront set path = (/usr/local/lang/SC1.0 $path) After you have made the necessary modifications to your .cshrc file you should logout and login again so that the changes will take effect. Now you have completely installed the Toolkit and are ready to begin working with the user interface program described in the next chapter. Chapter 3 {tc "3. USER INTERFACE PROGRAM"}User Interface Program The user interface program for the GA Toolkit allows the user to select a component for each phase of the GA, select the problem to solve, create the necessary input files, create an executable GA program and create a template for defining a new problem. This section describes how to uses the various features of the user interface program. To start the user interface program use the following command: % gatuif The name "gatuif" is an acronym for GA Toolkit User InterFace. After a few seconds the main menu will appear on your terminal. From the main menu, you can access all the major features of the user interface program, such as selecting GA components and creating an executable GA program. Pressing the return key will activate the highlighted choice. The following describes each of the options available on the main menu. {tc "Selecting GA Components" \L 2}Selecting GA Components The first step in creating an executable GA program is selecting which components to use for each phase of the GA process. Activating the Select GA Components option will bring up a screen divided into two parts. On the left side of the screen there is a menu of components and the right portion of the screen shows the current configuration--the component that is currently selected for each phase. To change the component for a particular phase, use the up and down arrows to select the category on the menu (the left side of the screen). Pressing the return key will bring up a list of available components or a sub-menu for the highlighted category. Once the desired technique is highlighted, pressing the return key will select that technique. To see a short description of a particular GA component highlight it by moving the up and down arrow keys and press the F1 key (or the help key for your terminal). When you are satisfied with the configuration of components, press CTRL-U to return to the main menu. This portion of the user interface program also manages any constraints on which components may used together. When you select a category from the menu (e.g., Selection or Replacement) only those techniques that are compatible with the current configuration will be displayed. This will prevent you from accidentally creating an invalid GA configuration. {tc "Description of Available Components" \L 3}Description of Available Components Chromosome--Representation - The chromosome representation is how the chromosome structure is implemented. Currently the Toolkit only provides one implementation of the chromosome structure, but others, such as diploid or two-dimensional chromosomes, could be added. 1-D Bit String - With this component, the chromosome is implemented as a one dimensional string of ones and zeros. Chromosome--Decoding - The chromosome decoding specifies how the chromosome string will be interpreted or decoded for the evaluation function. Binary - This component provides binary decoding of the chromosome string (i.e., converting the string of 1s and 0s to a decimal integer). Gray - This component allows the chromosome string to be interpreted as Gray code. With Gray code, adjacent integers (e.g., 7 and 8 or 8 and 9) have a Hamming distance of one. Reproduction - The reproduction style defines how many offspring are produced before the replacement phase is initiated. Generational - In generational reproduction, an entire generation of offspring are produced and then the replacement phase determines which offspring and parents will survive in the next generation. Individual - With individual reproduction, as soon an offspring is created the replacement phase selects a member of the population to be removed in order to make room of the new offspring. The concept behind individual reproduction is that the GA could adapt faster if offspring are made available as soon as they are produced rather than waiting for a complete generation. Selection - The selection technique determines which members will be given an opportunity to reproduce and pass their genetic code onto subsequent generations. In general, the selection technique will provide more reproductive trials to the more fit individuals of the population. SUS - In Stochastic Universal Sampling (SUS) (Baker 1987), each individual is allocated a slice on a spinning wheel according to its expected value. An individual's expected value is the ratio of the its fitness to average fitness of the population. The wheel contains N evenly spaced pointers, where N is the size of the population. The wheel is spun one time and the individual at each pointer is allocated a reproductive trial. With this selection technique, each individual is guaranteed to receive either {EMBED Equation |} or {EMBED Equation |} trials. GENESIS Rank - GENESIS Rank selection (Grefenstette 1990) is similar to SUS using a spinning wheel with N evenly spaced pointer. But the size of the slice on the wheel is determined by the individual's rank within the population rather than the actual fitness value. Linear Rank - Linear Rank selection (Whitley 1989) uses a linear function to bias the allocation of reproductive trials toward the better ranked individuals. Linear Rank selection provides a parameter, called Selection Bias, so that the degree of preference for the best members can be adjusted. For example, a Selection Bias of 1.5 means that the best individual is 1.5 times more likely to receive a reproductive trial than the median ranked individual in the population. Replacement - The replacement phase determines which individuals will be replaced when new offspring are produced. Since the replacement phase is initiated at different times with the different styles of reproduction, there are several techniques that are appropriate for each type of reproduction. Replacement Techniques for Generational Reproduction DeJong's Gap w/Single Elitism - This technique implements DeJong's Gap model (1975) that allows overlapping population. The DeJong Gap specifies what percentage of the next generation should be created by the genetic operators (selection, crossover and mutation) and the remainder of the population is filled by randomly selecting, without replacement, individuals from the parent population. For example, a DeJong Gap of 0.8 would mean that 80% of the next generation would be new offspring and the remaining 20% would be randomly selected members from the parent population. This technique also provides a single elitism feature which guarantees that the best solution developed so far will always have a slot in the next generation. Variable Elitism - This replacement policy allows the user to specify the percentage of the population slots (by the Elite Gap parameter) to reserve for the best solutions developed so far and the remaining slots are filled by new offspring. With this technique, N offspring are created by the genetic operators (selection, crossover and mutation) and then the best N * Elite Gap individuals from among both the new offspring and parents receive a slot in the next generation. Any remaining population slots are filled by randomly selecting, without replacement, members from the set of new offspring. For example, an Elite Gap of 0.35 reserves 35% of the next generation for the best individuals and the remaining 65% is filled with new offspring. Replacement Techniques for Individual Reproduction Remove Worst - With this replacement policy, when an offspring is created the worst member of the population is replaced. Reverse Rank - This replacement policy uses a linear function to bias the selection for replacement towards the worst ranked members of the population. The Replacement Bias parameter can be adjusted to altered the pressure on the worst individuals. For example, a Replacement Bias of 1.2 means that the worst individual in 1.2 times more likely to be replaced than the median ranked individual in the population. In addition, the Elite Gap parameter can also be used to explicitly preserve a percentage of the best ranked individuals. For example, an Elite Gap of 0.1 with a population size of 50 means that the best 10% of the population (the 5 highest ranked members) cannot be replaced. Reverse Tournament - With this replacement technique, two members from the population are randomly selected and the least fit member is replaced by the new offspring. Generational Emulation - This replacement policy is attempting to emulate the replacement policy used in standard generational GAs while using individual reproduction. This policy monitors the trials allocated and used by each individual. When a member has exhausted its allocated trials or did not receive any trials because of low fitness it becomes a candidate for replacement. When an offspring is created, a randomly selected individual from the set of candidates is replaced. This replacement policy can only be used with selection techniques that explicitly calculate the number of trials each individual should receive (i.e., SUS and GENESIS Rank selection). Crossover--Style - The crossover style specifies how the desired number of cut points will be selected. Traditional - With traditional crossover the cut points are randomly selected. Reduced Surrogate - This crossover technique (Booker 1987) attempts to create offspring that differ from their parents whenever possible. This techniques notes the position on the chromosome string where the parents have different values (called the "reduced surrogate") and randomly selects cuts points between the first and last position in the reduced surrogate. For example, Parent1 11001100 Parent2 10010110 Reduced -1-01-0- Surrogate -0-10-1- A cut point any where after the second position and the before the seventh position would produce an offspring that differed from its parents. Uniform - With Uniform crossover (Syswerda 1989), the parents' value for each position on the chromosome string is swapped with a fixed probability to create new offspring. For example, if the swapping probability is 0.5, then for each position on the chromosome string we would flip a fair coin to decide whether the offspring should receive the bit value from the first or second parent. Crossover--# Cut Points - This category specifies the number of cut points that will be selected or the how many segments to exchange between parents when creating new offspring. You can select either one point or two point crossover. Mutation Canonical - Once a bit has be selected for mutation, this technique randomly chooses a new value for the bit from the set of possible values. Statistics - This category defines the statistics to collect and report about the GA experiment. Basic Stats - This technique provides information about population convergence, the best and average performance of the current population as well as on-line (average of all solutions tested) and off-line (average of the best solution for each generation) performance measures. See DeJong (1975) for a discussion of on-line and off-line performance measures. {tc "Selecting the Problem to Solve" \L 2}Selecting the Problem to Solve The Select Problem option on the main menu allows you to select the problem the GA will attempt to solve. Pressing the return key when Select Problem is highlighted on the main menu will bring up a screen divided into two parts. The top of the screen displays the name of the problem that is currently selected. The bottom portion of the screen lists all the problems that are currently defined in the system. To change the problem, use the up and down arrow keys to highlight the desired problem, in the bottom portion of the screen, and press the return key to select the problem. After you have selected the desired problem, press CTRL-U to return to the main menu. {tc "Description of Available Problems" \L 3}Description of Available Problems Counting 1s - In this problem, the GA is attempting to maximized the number of 1s in the chromosome string. The fitness value is the number of 1s in the chromosome string. (File func.ct1, minimization problem) Partially Deceptive - This is an artificially created partially deceptive problem. This problem consists of ten 4-bit sub-problems distributed at positions i , i + 10, i + 20, i + 30, for i = 1 to 10 on the chromosome string. For each 4-bit sub-problem, the order-1 and order-2 schemata are fully deceptive, but the order-3 schemata are not deceptive. (File func.d4, maximization problem) See Whitley (1991) for a discussion of deception in GAs. DeJong's F1 through F5 are the five problems from DeJong's test suite. (See DeJong 1975 for a description of these problems.) Ackley's Fig. 1-7 - This problem (Ackley 1987, 16-17) is basically a flat space with several very steep hills. The problem is defined by the following function: {EMBED Equation |}. (File func.fig7, maximization problem) Noisy Ackley's Fig. 1-7 - This problem is similar to Ackley's Fig. 1-7 except that a noise factor is used to alter the value of the function in about half the chromosomes evaluated. The noise factor is created by generating a random number in the range [0, 1) and using it as a percentage of how much the actual function value should be altered (i.e., noise factor = random number * function value). The GA randomly chooses to add or subtract the noise factor to the actual function value. (File func.nfig7, maximization problem) Noisy Shifted 3-D Parabola - This problem also incorporates a noise factor in a space defined by the following function: {EMBED Equation |}. This space is a 3-dimensional parabola shifted slightly off the origin. The noise factor is defined by the following: noise factor = (random number / 2) * function value. The GA randomly chooses to add or subtract the noise factor to the actual function value. On average, half of the function evaluation will be altered by the noise factor. (func.npar, minimization problem) Shifted 3-D Parabola - The space of this problem is a 3-dimensional parabola shifted slightly off the origin and defined by the following function: {EMBED Equation |}. (File func.par, minimization problem) Select/Replace Test - This problem was used to observe the characteristics of various combinations of selection and replacement techniques. The function used for this problem is the following: {EMBED Equation |}. (File func.repl, maximization problem) TSP-Swap - This is a 16 city Traveling Salesman Problem where we are attempting to find the shortest tour between 16 cities in the continental US traveling by automobile. In this problem, we use the information in the chromosome to tell us how to create a permutation of the list of cities to visit. Each pair of genes on the chromosome, tell us which two cities to swap to create a different tour and perhaps located the optimum tour. (File func.swap, minimization problem) TSP-New-Swap - This problem is similar to the TSP-Swap problem, but we interpret the information in the chromosome slightly different. Instead of using pairs of genes for swapping instructions, we use the gene position and the gene value. For example, if the first gene has a value of 5 then we swap the cities at the first and fifth position on our list, if the second gene has a value of 4 then we swap the cities at the second and fourth position on the list, and so on. (File func.swp, minimization problem) {tc "Creating an Executable GA Program" \L 2}Creating an Executable GA Program After you have selected the component to use for each GA phase and the problem the GA will be applied to, you can create the executable GA program. Use the up and down arrow keys to select the Make GA option from the main menu and press the return key. A dialog box will appear asking you to give a name for the executable GA program. After providing a name the system will begin re-compiling the necessary program code. This may take several minutes depending on which components were changed, so please be patient. When the executable program has been created a message will appear in the dialog box informing you that the program was successfully created. Pressing the return key after receiving the success message will return you to the main menu. Compilation error should not occur if you are using the components original provided with the GA Toolkit. However, if you have incorporated additional components or modified the components in the Toolkit compilation errors could occur if you have made an error in the program code. If errors were encountered during the re-compiling process a message will appear that will inform you of this event. This message will also tell you about a file that contains all the message produced during the re-compilation process. Pressing the return key while this message is displayed will return you to the main menu where you should exit from the user interface program. By examining the file containing the messages from the re-compilation process, you should be able to locate where the error occurred and be able to correct it. Before you can execute the program you have just created you need to create several input files. This process is described in the next section. In addition, you should refer to Chapter 4 for a discussion of how to execute your GA program and a description of the output files created by the GA program. {tc "Creating Input Files" \L 2}Creating Input Files Before you can execute the GA program you will need to create two input files. The in file contains various user defined parameters for controlling the GA experiment such as the maximum number of trials and the crossover and mutation probabilities. The genes file contains information on how to interpret the chromosome string for the problem the GA is attempting to solve. For example, the genes file specifies the number of genes in the chromosome and the format for displaying the gene values. This section describes how to create these input file through the user interface program. In addition, this section also briefly describes the contents of each file. You begin the input file creation process by selecting the Create Input Files option on the main menu and pressing the return key. This will bring up a dialog box asking you to provide a file suffix for the input files. When you execute your GA program, the program will, by default, look for the "in" and "genes" input files. However, providing the GA program with the file suffix will cause the program to read the information from the input files with the specified suffix. For example, if you use the suffix "p1" (for problem 1) then the GA program will take the input information from the "in.p1" and "genes.p1" input files. Using a file suffix is helpful if you want to experiment with several problems with different input parameters and different chromosome formats. If you want to use a file suffix, then type a suffix in the space provided and press the return key. Otherwise, just press the return key to accept the default value (no file suffix). Then the Setup Input Files screen is displayed. This screen allows you to choose which file (in or genes) to create. Use the up and down arrow keys to highlight the desired option and press the return key to select the option. The following describes how to create the in file and describes the various parameters contained in this file. The Creating the "genes" File section will describe the genes file and its contents. {tc "Creating the \"in\" File" \L 3}Creating the "in" File Selecting the in file option will bring up a screen divided into two parts. The left side of the screen contains a menu with Edit and Save options. The right side of the screen lists the various user-defined parameters and default values for each. The default values are not the product of any scientific study to develop optimum parameter values--they are only reasonable settings for each parameter. Depending on the type of problem you want to solve the default values may or may not be appropriate. Feel free to change the value of any or all parameters. To edit the values of the parameters, press the return key when the Edit option is highlighted. A parameter on the right side of the screen will be highlighted (usually the first parameter on the screen). Use the up and down arrow keys to highlight a parameter. The SHIFT-U and SHIFT-D key combinations can also be used to scroll up or down an entire screen. Pressing the return key will bring up a dialog box asking you to enter a value for the highlighted parameter. If you enter an invalid value for the parameter a dialog box will appear explaining the appropriate values. After you have read this information, pressing the return key will allow you to return to the previous dialog box and enter an appropriate value. You can change as many parameters as you want by highlighting the parameter and pressing the return key. If you want more information about the parameters, pressing the F1 key will bring up a help dialog box that contains short descriptions of all of the parameters. When you have finished adjusting the parameter values, use the CTRL-U key combination to return to the menu on the left side of the screen. Then you can select the Save option to save your parameter settings. To exit from the in file creating section press CTRL-U again and you will return to the Setup Input Files screen that contains the menu where you selected which file to create (in or genes). {tc "Description of Input Parameters" \L 3}Description of Input Parameters Total Experiments - This is the number of experiments you would like to conduct with this set of input parameters. That is, the number of complete GA runs where the GA converges to solution or exceeds the maximum trials. In each experiment, the GA starts with the same initial population and then the random number generator is seeded with a different value so that a different set of choices will be made. Total Experiments should be an integer number greater than zero. Total Trials - This is the maximum number of trials you want to occur before the GA is stopped. Total Trials should be an integer number greater than zero. Population Size - This is the number of chromosomes or sample points of the search space you want the system to work with. Population Size should an even integer number greater than zero. (The number must be even because two parents are selected and always produce two offspring.) DeJong Gap - This is used with the DeJong Gap with Single Elitism replacement policy. This parameter should be a real number in the following range [0.0, 1.0]. The Gap measure specifies what percentage of the population should be created via the genetic operators (selection, crossover and mutation) and the remaining portion of the population are randomly selected members of the previous generation. For example, a Gap of 0.8 would mean 80% of new generation is created by selection, crossover and mutation and the remaining 20% are copies of randomly selected members of the previous generation. Elite Gap - This is used with the Variable Elitism and the Reverse Rank replacement policies. This parameter should be a real number in the following range [0.0, 1.0]. Elite Gap specifies the percentage of the population reserved for the best members so far. For example, suppose we have a population of 50 chromosomes and an Elite Gap of 0.1. This means we want to preserve the best 10% or 5 members. Prob. Cross - This is the crossover probability or how often crossover should occur. This should be a real number in the following range [0.0, 1.0]. Prob. Mutation - This is the mutation probability or how often mutation should occur. This should be a real number in the following range [0.0, 1.0]. Selection Bias - This is used with the Linear Rank selection technique. This is how much preference should be given to the better members of the population. Selection Bias should be a real number in the following range (1.0, 2.0]. For example, a selection bias of 1.5 means the best member of the population is 1.5 times more likely to be selected than the median ranked individual of the population. Replacement Bias - This is used with the Reverse Rank replacement policy. This is how much pressure should be placed on replacing the worst individuals in the population. Replacement Bias should be a real number in the following range (1.0, 2.0]. For example, a Replacement Bias of 1.5 means the worst individual is 1.5 times more likely to be replaced than the median ranked individual. Maximum Spin - This is used to stop the experiment if no progress is being made. If a new chromosome has not be created in (Maximum Spin) generations then the population has probably converged and no real progress is being made so we should stop this experiment. This parameter should be an integer number greater than zero. Statistics Interval - This is how many trials should occur between printing experiment statistics. This parameter should be an integer greater than zero. Statistics are only calculated after (Population Size) attempted trials or roughly each generation. The number of actual trials (when a chromosome needs to be evaluated) may be less than the attempted trials. So the Statistics Interval is a lower bound on the number of trials that must occur before another set of statistics is printed. That is, the actual number of trials before the reporting of statistics may be slightly more than the specified Statistics Interval. Hamming Metrics - This is a boolean flag for whether the clustering and drift Hamming metrics (Frederick, Sedlmeyer and White 1991) for population convergence should be calculated and reported. Clustering measure the average Hamming distance between all pairs of chromosomes in the population. And drift measures the average Hamming distance between the best individual and all other chromosomes in the population. Best Size - This is how many of the best chromosomes developed should be saved and printed to a file (the best file described in Chapter 4) at the end of the experiment. This parameter should be an integer greater or equal to zero. If Best Size is greater zero, then the (Best Size) members with the best fitness values developed in the entire experiment are preserved in a separate area for reporting purposes only, regardless if any elitism policy is used in the other phases of the GA. Worst Size - This is used as a scaling mechanism with SUS selection. This is similar to the scaling window used in the GENESIS GA program. This parameter should be an integer number greater or equal to zero. If we are minimizing a numerical function, then the fitness of a member (used for the selection process) is the worst fitness value minus the function evaluation for this particular member. The worst fitness value is the worst member seen in the last (Worst Size * Population Size) trials, if Worst Size is greater than 0. This allows more differentiation between members whose fitness values are very close together. The smaller the Worst Size (greater than zero) the greater the differentiation. For a Worst Size of zero, the worst value is the worst member seen in the entire experiment thus far. This represents an infinite scaling window or very little ability to differentiation between very similar values. This scaling effect works analogously for maximization problems as well. Maximize - This is a boolean flag to indicate whether the evaluation function should be maximized or minimized. Turn the flag on (or yes) for maximization and off (or no) for minimization. Note: If you are setting parameters for one the problems provided with the GA Toolkit make sure you set this flag correctly, otherwise you will get unexpected results. The Description of Available Problems section above states whether each problem should be maximized or minimized. Evaluate All - This is a boolean flag to tell the system whether to evaluate all chromosomes or only those that need evaluation. Generally, if a new chromosome (offspring) is equal to one of its parents we do not need to re-evaluate the function again, but simply copy the function parameters and fitness measure from the matching parent. However, in noisy or non-stationary functions the same chromosome may have a different function evaluation at varying times, so all chromosomes inserted into the next generation need to be re-evaluated. Setting this flag on (or yes) forces all chromosomes to be evaluated at the end of each generation. If this parameter is turned off (or no), new chromosomes are only evaluated if they differ from their parents. Seed Population - This is a boolean flag for whether the initial population should be seeded with specific chromosomes from an input file. Setting the flag on (or yes) will cause the program to read the init file (or the init.suffix file if you provide the GA program a file suffix). This file should contain the chromosome strings you want to place in the initial population with one chromosome per line. This file can contain any where from one to an entire population of chromosomes. Debugging Level - This is used for tracing the procedure calls activated during the GA run. The debugging feature can produce and a lot of information that is not very interesting, but may be helpful in debugging a new technique. I would recommend always setting this flag to 0 (no debugging trace). However, if you are interested in this information a value of 1 shows entry and exit of all major procedures. A value of 2 produces the same information as a value of 1 but also includes entry into class constructors and destructors. And finally a value of 3 shows the information for a value of 2 plus some addition information about major events that are occurring. Display - This is a boolean flag to indicate whether to show, on the screen, information about the GA's execution. This display includes information about the current trial, experiment statistics, the best chromosome so far and which GA components are being utilized. Setting the flag to on (or yes) turns the display mode on. Print Generations - This is a boolean flag to determine whether the population should be printed to a file (the pops file described in Chapter 4) at the end of each generation. Turning this flag on (or yes) will cause this information to be printed. Random Seed - This is the seed for the random number generation procedures. This parameter should be a long integer greater or equal to zero. Swapping Prob. - This is the swapping probability for Uniform crossover. This determines how frequently the positions on the chromosome string will be swapped. This should be a real number in the following range [0.0, 1.0]. {tc "Creating the \"genes\" File" \L 3}Creating the "genes" File The genes file describes the structure of the chromosome string. This file specifies how many genes the chromosome contains, the length of each gene and the format for displaying the value of each gene. If you are applying a GA to one of the pre-defined problems that was installed with the GA Toolkit, then you do not need to create the genes file. The genes files for these problems have already been created and are located in the Input sub-directory. The suffix of the genes file corresponds the suffix of the file where the problem is defined (e.g., genes.ct1 is the genes file for the Counting Ones problem defined in the func.ct1 file). If you need to define a new genes file the following describes the process for accomplishing this task. Selecting the genes file from the Setup Input Files screen and pressing the return key brings up a dialog box asking whether you will be using the floating point representation. If the problem the GA is being applied to is defined using the floating point decoding mechanism (i.e., the eval_chrom class that defines the problem is derived from the float_chrom class), then you should answer yes to this question. The answer to the floating point representation question determines what type of information is required in the genes file. After providing an answer to the floating point representation question, a screen divided into two sections will appear. The left side is again a menu that allows you to add, edit or delete gene specifications and save the information to the genes file. The right portion of the screen is a table format for displaying the gene specifications that you will create. For floating point representation, the table contains fields for the gene number, the minimum and maximum value of the floating point range, the number of values in the range, the format for printing the gene value and the length of the gene on the chromosome string. When you are not using the floating point representation the table contains three fields: the gene number, the format for printing the gene value and the length of the gene. Adding Genes The procedure for specifying the information in the genes file is similar whether or not the floating point representation is being used. The only difference is the fields that need to be specified. To add information about a gene highlight the Add Gene option in the menu on the left side of the screen and press the return key. Then the system will ask you to provide a value for each to the appropriate fields. The following describes each field you will be required to specify. Fields for Floating Point Representation Minimum Value - this is the smallest value in the desired floating point range. Maximum Value - this is the largest value in the desired floating point range. Number of Values - this is the number of values in the floating point range and defines the precision of the values. The value for this field must be a power of 2 (e.g., 2, 4, 8, 16, 32, 64, ...). For example, suppose the minimum value is -5.11, maximum value is 5.12 and the number of values is 1024, then the gene can represent values in the following range [-5.11, -5.10, -5.09, ... , 5.10, 5.11, 5.12] and the precision is 0.01. Format - this is the printf conversion string for printing the value of the gene. The gene values are store in an array of type double. The following are valid conversion strings: %f - for floating point conversion (e.g., 5.043245 using %.3f is printed as 5.043). %e and %E - for scientific notation (e.g., 504.3245 using %.4e is printed as 5.0432e+02 and using %.4E is printed as 5.0432E+02). %g and %G - %e or %E is used if the exponent is less than -4 or greater than or equal to the precision; otherwise %f is used (e.g., 504.3245 using %.5g is printed as 504.32 and 50432.45 using %.3g is printed as 5.04e+04). For more explanation of printf conversion strings refer to a C or C++ programming manual. Note that since the values of the genes are stored as type double, the only valid conversion characters are f, e, E, g and G. Length - this field will automatically be calculated for you by using Number of Values field. For example, it the number of values field is 64 then the length is 6 positions on the chromosome string (log(64) base 2 = 6). Repetitions - After you have provided values for all the required fields the system will ask you how many repetitions of the specification you wish to create. If you have multiple genes with the same specification you will not need to repeatedly enter the same information over and over again, just specify the number of copies to create. Fields for Non-Floating Point Representation Format - this is again the printf conversion string for printing the gene value, similar to the format field with floating point representation (see above). Length - this is the number of positions on the chromosome string that the gene will occupy. Repetitions - similar to the floating point representation you can instruct the system to create multiple copies the gene specification you have just created. After you have finished the gene specification, the values for each field will be displayed in the table on the right side of the screen. If you create more than a screen full of genes then you can use SHIFT-U and SHIFT-D to scroll up or down to view the genes you have created. At this point you can edit or delete a gene you have created or add additional genes. Editing a Gene To edit a gene specification, highlight the Edit Gene option and press the return key. A dialog box will appear asking you for the number of the gene you wish to modify. The gene number is the first column in the table on the right portion of the screen. After you have provided the gene number a pop-up menu appears that contains the fields which you may edit. Using the up and down arrow keys to highlight the field to modify and press the return key. Another dialog box will appear asking you to enter a new value for the field. From the pop-up menu of fields you can modify any or all fields of that particular gene. When you are finished modifying the gene press CTRL-U to return to menu on the left side of the screen. Deleting a Gene If you made a mistake and want to completely remove a gene, highlight the Delete Gene option and press the return key. A dialog box will appear asking you for the number of the gene you wish to delete. Once you remove a gene the only way to undo this action is re-specific the gene through the Add Gene option on the menu. Saving the genes File and Exiting And finally, when you are finished creating the information about the genes, use the Save option to write this information to the genes file. To exit the genes specification section press the CTRL-U key combination. This will return you to the Setup Input Files screen, where you selected which file to create either the in or genes file. If you have already created both input files you can use the CTRL-U key combination to return to the main menu of the user interface program. {tc "Defining a New Problem" \L 2}Defining a New Problem The Create New Problem option on the main menu allows you to create a template file for defining the evaluation function for a new problem. Pressing the return key when the Create New Problem option is highlighted initiates this option. A series of dialog boxes will appear asking you questions about the new problem. You will first be asked to enter a suffix for the file that will contain the new problem. All evaluation functions are keep in files with "func" as the root name of the file. The file suffix allows the system to identify the proper file when you select a particular problem to solve. (Note: You should use a unique suffix otherwise the file for a previously defined problem will be replaced with the new file you are creating. See the Description of Available Problems above for the suffixes that are currently being used.) The second dialog box that will appear asks you to provide a short descriptive name for the problem. The name you provide is the description that will appear in the list of problems when you select the problem to solve through the Select Problem option on the main menu. The final question that you will be required to answer is the type of values that will be required by the new evaluation function. The system needs to know the type of values that are needed so that it can set up the appropriate representation and decoding mechanism. In this dialog box you have a menu of four choices. You can use the up and down arrows to highlight the desired type of values and press the return key to select that option. The following describes the four types of values. Properties of Chromosome String - this option is for evaluation functions that do not require the genes on the chromosome to be decoded as integers or floating point values. The fitness of the chromosome will be determined by some type of property of the binary chromosome representation. For example, the Counting Ones Problem would use this type of value since the fitness is related to the number of 1s in the chromosome string. Integers - Binary Decoding - this option is for evaluation functions that require integer parameters where you want the genes to be interpreted as binary code. That is, the genes on the chromosome string with be converted from the binary representation to decimal integers. Integers - Gray Decoding - this is similar to the previous option but the genes on the chromosome string will be interpreted as Gray code rather than binary code. Floating Point Values - this option is for evaluation functions that require floating point values. In the case of floating point values you do not need to specify the decoding mechanism (i.e., binary or Gray) that will be used when you define the problem. The decoding mechanism can be selected through the Select GA Components option on the main menu when you create the GA program that will be applied to this problem. After you have chosen the type of values that are required for the new problem, a message will appear telling you that the file for this problem has been successfully created and that you need to edit the file to add the program code for the evaluation function. Pressing the return key while this message is displayed will return to the main menu. If you want the edit the new file you will need to exit the user interface program. The files for all the evaluation functions are keep in the Source/Funcs sub-directory of the main directory where the GA Toolkit was installed. To edit the new file you will need to go to this directory. Your new file will have the "func" root name and the suffix you provided (e.g., func.test would be the file created if you specified "test" as the suffix). You can use any text editor to add the program code for your new evaluation function. The file you created will look similar to the following. The contents of the actual file are printed in a Courier font and additional comments about the file are printed in a Times Roman font. /* GA Toolkit - Sara A. Lienau * * Center for Advanced Technology * Computer Science and Telecommunications Program * University of Missouri - Kansas City * ------------------------------------------------------------------------ * file: eval_chr.temp * * purpose: eval_chrom class template * * modified: 21 aug 91 - created */ #include "extern.h" #include "chrom.h" #include "eval_chr.h" // It may be helpful to document some information about the function // here. This is entirely optional. /* Function Description The Name/Source item contains the short descriptive name that you specified for this problem. This is the name that will be displayed on the list of available problems from the Select Problem section of the user interface program. * Name/Source: Test Problem 1 * Description: * Properties: * Min or Max: * Min: * Max: * Limits: * Space: The END_CHROM item is used to specify which class the eval_chrom class will be derived from. This determines what type of values will be provided to the evaluation function. If you need to change the type of values that you originally specified you should use one of the following class names: bit_chrom, gray_chrom or float_chrom. The bit_chrom class should be used for values from the properties of the chromosome string or integer values using binary decoding. The gray_chrom class should be used for integer values using Gray decoding. And the float_chrom class should be used for interpreting the genes as floating point values. * END_CHROM: gray_chrom * GENES info in: */ /*------------------------------------------------------------------------ -- class eval_chrom (objective function) ------------------------------------------------------------------------*/ The get_name function is used as a kind of sanity check. This function returns the name of the problem being used and is printed in an output file (the log file described in Chapter 4). This is useful to confirm that you are solving the problem you intended. char* eval_chrom::get_name(void){ // get name - returns a string describing the evaluation function // and chromosome representation char name[60]; strcpy(name, "Test Problem 1, "); strcat(name, END_CHROM::get_name()); return( name ); } The eval function is where you will insert the program code for calculating the fitness of each chromosome. void eval_chrom::eval(void){ // apply evaluation function to chrom The decode function is used to decode the genes of the chromosome string into the appropriate type of values for your evaluation or fitness function. As noted in the comments in the file, you only need to call the decode function if the evaluation function requires floating point or integer parameters. If you are calculating the fitness value on some property of the chromosome string, you can comment out or remove the call to the decode function. // decode bit string first (only necessary if func. requires floating // point or integers) END_CHROM::decode(); /*---- BEGIN MODIFY ----*/ /* Insert code for evaluation function here. Final value MUST be * assigned to the variable "fitness". * * Summary of Variables * str - Chromosome Bit String * array of unsigned short int * LCHROM - size of "str" array * vect - Decoded Genes of Chromosome if decode function called * array of double * GENES - size of "vect" array * * CAUTION: str, LCHROM & GENES should be treated as read-only * variables. Assigning values to these variables may cause * unpredictable behavior. However, you may assign values to the * elements of the "vect" array if you want something other than * the decoded genes to be printed by the chromosome print function. */ The above comments give a summary of the variable you have access to for calculating the fitness of the chromosome. As an example suppose we want to optimize the function {EMBED Equation |} and we have properly defined three genes (one for each parameter) in the genes file. The following is an example of how to code the fitness function. The decode function above will be called to place the decoded values of the genes in the "vect" array. double x, y, z; // vect[0] is the x parameter, vect[1] is the y parameter and vect[2] is the z parameter x = vect[0] * vect[0]; // x^2 y = vect[1] * 3; // 3*y z = vect[2] - 2; z = z * z; // (z-2)^2 fitness = x + y + z; /*---- END MODIFY ----*/ // clear eval flag do_eval = 0; } After you have written the program code for the new problem you can apply a GA program to this problem by selecting it through the Select Problem option on the main menu. Chapter 4 {tc "4. EXECUTING YOUR GA PROGRAM"}Executing YOUR GA Program This chapter describes how to execute the GA programs you create through the user interface program and the output files produced by the GA program. The GA program works with several input files and creates several output files. You can direct the GA program to used the default names for these files (i.e., file with no suffix) or used the files with a specified suffix. To execute the GA program use the name you assigned to the program as the command and you may supply an optional parameter for the file suffix (i.e., % ga_program_name [suffix]). For example, if you selected the default name for your GA program (i.e., GA.EXE) and you want the program to read input files with the "test" suffix (i.e., in.test and genes.test) and create output files with this suffix use the following command. % GA.EXE test If you do not supply the optional command line parameter the GA program will look for the default input files (i.e., in and genes) and the program will use the default names for the output files created during the run (i.e., best, log, out, and pops). The file suffix options allows you to save the input and output information about different experiments without continually renaming the files used by the system. {tc "Description of Output Files" \L 2}Description of Output Files best The best file contains the specified number of the best chromosomes developed during each experiment. The number of chromosome printed in this file is controlled by the Best Size parameter in the in file. This file is only created if the Best Size parameter is greater than zero. This file contains the chromosome string, the value of each gene, the fitness value and the generation and trial when the chromosome was first created. The following is an example of this file. Experiment: 1 101101010101001110100101110011 2.13 -1.98 -1.41 0.5954 10 193 The first line is the chromosome string. The second line show the values of the genes defined for this chromosome. The last line contains the fitness value (0.5954) and the generation (10) and trial (193) when this chromosome was created, respectively. log The log file contains general information about the GA experiment. It contains the date and time the experiment was conducted, most of the values of the input parameters and the GA components used for each phase of the GA process. The following is an example of a log file. Experiment Conducted: Sun May 3 13:29:10 1992 Input Parameters: Total Experiments = 1 Total Trials = 200 Population Size = 20 Chromosome Length = 30 DeJong Gap = 1.0000 Elite Gap = 1.0000 Prob. Cross = 1.0000 Prob. Mutation = 0.0010 Selection Bias = 1.5000 Replacement Bias = 1.5000 Maximum Spin = 3 Best Size = 1 Worst Size = 3 Maximize = 0 Random Seed = 123456789 Genetic Algorithm Components Chromosome: Shifted 3-D Parabola, Floating Pt, 1-D Bit Str. Reproduction: Generational Selection: GENESIS Rank Replacement: DeJong Gap w/Single Elitism Crossover: 1 PT Reduced Surrogate Mutation: Canonical Mutation Statistics: Basic Stats out The out file contains the experiment statistics that are printed after each Statistics Interval (input parameter in the in file). At each Statistics Interval the following information in printed. Current Generation - this is the number of generational cycles completed. Current Trial - this is the number of trials or chromosomes created so far. Lost Allele - this is the number of allele (positions on chromosome) where the entire population has the same value. Converged Allele - this is the number of allele where at least 80% of the population has the same value. Allele Bias - this is the average percentage of the most prominent value for each allele. For example, a bias of 0.70 means that, on average, each position has converged to 70% 0s or 70% 1s. The minimum bias is 0.5 when there is an equal distribution of 1s and 0s for each allele. On-line performance - this is the mean of all solutions tested to this point in the run. (DeJong 1975) Off-line performance - this is the mean of the best solutions developed at the end of each generation. (DeJong 1975) Best - this is the fitness value of the best solution in the current population. Average - this is the average fitness of the population at this point in the experiment. If the Hamming Metrics flag is turned on then the On-line and Off-line performance fields are replaced by the Clustering and Drift metrics, respectively. Clustering - this is the average Hamming distance between each pair of chromosomes in the population. A value near 1.0 indicates a lot of diversity in the population. As the clustering metric moves toward 0.0 the population is converging and losing diversity. (Frederick, Sedlmeyer and White 1991) Drift - this is the average Hamming distance between the best chromosome and all other chromosomes in the population. Similar to the clustering metric, a value near 1.0 indicates a lot of diversity and a value of 0.0 means the population has completely converged to a single chromosome. (Frederick, Sedlmeyer and White 1991) The following is an example of the out file. Experiment: 1 Gen Trials Lost Conv Bias OnLine Offline Best Average 0 20 0 0 0.582 3.2093e+01 1.1283e+01 2.874900e+00 3.209285e+01 2 56 0 2 0.635 2.8387e+01 5.8203e+00 2.750900e+00 2.168309e+01 4 94 0 6 0.670 2.3457e+01 4.4480e+00 9.473000e-01 1.582167e+01 6 134 0 8 0.687 1.9270e+01 3.3912e+00 9.033000e-01 7.569095e+00 8 170 3 13 0.732 1.6394e+01 2.8505e+00 8.090000e-01 4.364355e+00 10 208 3 18 0.772 1.4202e+01 2.4559e+00 5.954000e-01 4.710895e+00 pops The pops file contains a printed copy of all chromosomes in each generation. This file is only created if the Print Generation flag is turn on in the in file. Depending on the population size and the maximum number of trials this file can be very large. The following shows the format of this file. Experiment: 1 GENERATION: 0 111110010010000101011000101011 4.84 0.21 0.43 16.8946 0 1 001101111100110001010011010010 -2.89 -3.15 -3.02 27.6750 0 2 010101001000111011111110010101 -1.74 -2.73 4.05 52.1030 0 3 Each chromosome printed in the pops file has the same format as those printed in the best file. The first line contains the chromosome string. The second line displays the value of each gene defined for the chromosome. And the last line contains the fitness value and the generation and trials when the chromosome was created, respectively. CHAPTER 5 {tc "5. PROGRAMMER'S REFERENCE"}Programmer's Reference {tc "Description of Classes" \L 2}Description of Classes Chromosome Classes base_chrom - This class is the root of the chromosome hierarchy of classes. The base_chrom class provides several data items that are common to all chromosomes regardless of their actual implementation, such as a variable for the fitness value. (Files chrom.h & chrom.C) bit_chrom - The bit_chrom class implements a one-dimensional binary string chromosome representation. This class is derived from base_chrom and provides services for the other GA phases to access and modify the chromosome structure. (Files chrom.h & chrom.C) gray_chrom - The gray_chrom class is derived from bit_chrom and overloads the decoding method so that the chromosome can be interpreted as Gray code (rather than binary). (Files chrom.h & chrom.C) float_chrom - The float_chrom class provides a decoding mechanism that allows the chromosome to be decoded as floating point values rather than integers. The decoding procedure first decodes the chromosome to integers and then maps the integer value to a floating point range. This class can be derived from either bit_chrom or gray_chrom so that the chromosome can be interpreted as binary or Gray code. (Files chrom.h & chrom.C) eval_chrom - The eval_chrom class implements the problem specific evaluation or fitness function. This class can be derived from bit_chrom, gray_chrom or float_chrom depending on what type of values the evaluation function requires (i.e., integers or real numbers). (File eval_chr.h & eval_chr.C) Reproduction Classes base_pop - The base_pop class is the root of the population hierarchy of classes. It implements data structures and procedures that are common to both generational and individual reproduction. For example, both style of reproduction need at least on population data structure and they share common initialization and termination procedures for the basic execution cycle of the GA process. (Files populat.h & populat.C) pop_gen - The pop_gen class is derived from base_pop and implements the basic execution cycle for generational reproduction. (Files pop_gen.h & pop_gen.C) pop_ind - The pop_ind class is derived from base_pop and implements the basic execution cycle for individual reproduction, where one or two offspring are created and immediately inserted into the population. (Files pop_ind.h & pop_ind.C) Selection Classes rank_sel - The rank_sel class implements Linear Rank selection and can be used with either generational or individual reproduction. (Files rank_sel.h & rank_sel.C) rank_sus_gen - The rank_sus_gen class implements GENESIS Rank selection for generational reproduction. (Files rk_sus_gen.h & rk_sus_gen.C) rank_sus_ind - This class implements GENESIS Rank selection for individual reproduction. (Files rk_sus_ind.h & rk_sus_ind.C) sus_gen - The sus_gen class implements SUS selection for generational reproduction. (Files sus_gen.h & sus_gen.C) sus_ind - This class implements SUS selection for individual reproduction. (Files sus_ind.h & sus_gen.C) Replacement Classes base_replace_g - The base_replace_g class is the root of the hierarchy for generational replacement policies. This class implements a shared procedure for storing new offspring until an entire generation has been created. (Files repl_gen.h & repl_gen.C) gen_gap_e1 - The gen_gap_e1 class is derived from the base_replace_g class and implements DeJong's Gap with Single Elitism replacement policy for generational reproduction. (Files repl_gen.h & repl_gen.C) gen_e_n - The gen_e_n class is derived from the base_replace_g class and implements the Variable Elitism replacement policy for generational reproduction. (Files repl_gen.h & repl_gen.C) killer - The killer class implements the Remove Worst (kill_worst procedure), Reverse Rank (rev_rank procedure) and Reverse Tournament (rev_tourn procedure) replacement policies for individual reproduction. (Files killer.h & killer.C) repl_rank - The repl_rank class implements tasks that Linear Rank selection, when used with individual reproduction, requires the replacement phase to accomplish (i.e., inserting new offspring in the proper order according to their rank). This class is derived from the killer class so that the replacement policy can be varied independent of the additional tasks that the selection technique requires. (Files repl_rank.h & repl_rank.C) repl_rk_sus - The repl_rk_sus class implements tasks that GENESIS Rank selection, for individual reproduction (rank_sus_ind), requires the replacement policy to accomplish. This class is derived from the killer class. The repl_rk_sus class also implements the Generational Emulation replacement policy (the kill_term procedure). (Files repl_rk_sus.h & repl_rk_sus.C) repl_sus - The repl_sus class implements tasks that SUS selection, for individual reproduction (sus_ind), requires the replacement phase to accomplish. This class is derived from the killer class. The repl_sus class also implements the Generational Emulation replacement policy (the kill_term procedure). (Files repl_sus.h & repl_sus.C) Crossover Classes n_pt - The n_pt class implements Traditional crossover where the cut points are randomly selected. (Files n_pt.h & n_pt.C) reduced_sur - The reduced_sur class implements the Reduced Surrogate crossover technique. (Files reduced.h & reduced.C) one_pt - The one_pt class can be derived from either n_pt or reduced_sur to provide a one point crossover procedure for either Traditional or Reduced Surrogate crossover, respectively. (Files cross.h & cross.C) two_pt - The two_pt class similar to the one_pt class can be derive from either n_pt or reduced_sur for a crossover technique where two cut points are selected. (Files cross.h & cross.C) uniform_xo - The uniform_xo class implements Uniform crossover. This class is derived from n_pt which provides the procedure for determining when crossover should be performed according to the crossover probability. But the uniform_xo class overloads the procedure for selecting the actual cut points. (Files unixo.h & unixo.C) Mutation Class can_mute - The can_mute class implements the Canonical mutation technique. (Files can_mute.h & can_mute.C) Statistics Classes basic_stat - The basic_stat class is used to collect and report various statistics about the GA experiment. (Files stat.h & stat.C) extreme_perfs - The extreme_perfs class is used by the basic_stat class for monitoring the best and worst performances or solutions developed by the GA. The best individuals may be printed to a file at the end of each experiment. The worst solutions developed are used as a scaling mechanism with fitness-based selection techniques (e.g., SUS). (Files stat.h & stat.C) {tc "Description of #define Directives" \L 2}Description of #define Directives With the GA Toolkit the user is easily able to exchange the components used for each phase of the GA process. To exchange components without manually modifying the program code, #define preprocessor directives of the C++ language were used where a particular class name needed to specified in the source code. Different techniques are easily exchange by changing the class name that is associated with the appropriate #define identifier. For example, the SELECT identifier specifies which selection technique the user desires. So when we needed to create an instance of the a selection class in the source code, we used the SELECT identifier name rather than the actual class name. Then we can easily change which selection technique is used by associating a different class name with the SELECT identifier. For example, #define SELECT sus_ind // SUS selection for individual reproduction // create instance of a selection class SELECT select_obj; // rather than sus_ind select_obj; The following describes the purpose of each #define identifier and the class names that may be associated with each identifier. #define Identifier Valid Class Names Description REP_CHROM bit_chrom The REP_CHROM identifier specifies the representation chromosome class. (Located in file def_rep.h) CODE_CHROM bit_chrom gray_chrom The CODE_CHROM identifier specifies the class that implements the desired method of chromosome decoding . (Located in file def_rep.h) END_CHROM bit_chrom gray_chrom float_chrom The END_CHROM identifier specifies the class that the eval_chrom class will be derived from. (Located in file def_rep2.h) CHROM eval_chrom The CHROM identifier specifies the complete chromosome class that all the other phases manipulate (e.g., selection and crossover). (Located in file def_chr.h) POP_TYPE pop_gen pop_ind The POP_TYPE identifier specifies the style of reproduction (generational or individual) that will be used. (Located in file def_pop.h) SELECT rank_sel rank_sus_gen rank_sus_ind sus_gen sus_ind The SELECT identifier specifies the selection technique to use. (Located in file def_phase.h) REPLACE gen_gap_e1 gen_e_n repl_rank repl_rk_sus repl_sus The REPLACE identifier specifies the replacement class to use. (Located in file def_phase.h) KILLER kill_worst rev_rank rev_tourn kill_term The KILLER identifier specifies the replacement policy when using individual reproduction. (Located in file def_kill.h) CROSS_TYPE n_pt reduced_sur The CROSS_TYPE identifier specifies which class to use for selecting the desired number of cut points. This is the class where one_pt or two_pt will be derived from. (Located in file def_xo.h) CROSS one_pt two_pt uniform_xo The CROSS identifier specifies the crossover technique to use. (Located in file def_phase.h) MUTE can_mute The MUTE identifier specifies the mutation technique to use. (Located in file def_phase.h) STATS basic_stat The STATS identifier specifies the class to use for collecting and reporting experiment statistics. (Located in file def_phase.h) {tc "Adding a New GA Component" \L 2}Adding a New GA Component This section describes how to add a new GA component to the Toolkit's library. This process requires that you create the program code for your new component and modify several of the existing source code files. The following describes the necessary steps and illustrates this process by showing how the uniform crossover technique was added to the Toolkit. 1. Create Program Code for New Component The first step is to write the program code for your new component. The major rule you must follow is that the class definition of your new component must have the same public interface as the classes already defined in the Toolkit for that phase. This is so that you can easily exchange the GA components. The header files and comments in the source code describe the interfaces for each phase. The source code files for your component should be placed in the Source sub-directory where the Toolkit was installed. The more familiar you are with the source code and the #define identifiers the easier it will be to develop the code for your new technique. As an example, the following describes the modifications that were necessary to incorporate the uniform crossover technique. The first step was to look at the public interface for the crossover phase. In the crossover hierarchy there were originally four classes: one_pt, two_pt, n_pt, and reduced_sur. The one_pt and two_pt classes implement crossover with the corresponding number of cut points. These classes can be derived from either n_pt or reduced_sur so that the desired number of cut point(s) can be randomly selected (n_pt) or selected by the Reduced Surrogate method (class reduced_sur, which attempts to create offspring that differ from the parents whenever possible). The other phases of the GA will interact with an instance of the one_pt or two_pt class. So we should first look at the public interface for these classes. The class definition for one_pt is shown below (this is in the cross.h file). class one_pt : public CROSS_TYPE { public: one_pt(void) : CROSS_TYPE(1){} char* get_name(void); }; The use of the CROSS_TYPE identifier allows us change which class (n_pt or reduced_sur) one_pt will be derived from. The public interface contains a constructor and the get_name method. The constructor calls the parent class' constructor with a parameter for the number of cut points (i.e., 1). The get_name method returns the name of the component for printing in the log file. Now we need to look at the parent class for methods that it inherits. The following shows the class definition for the n_pt class (located in the n_pt.h file). class n_pt { public: n_pt(int); ~n_pt(void) { delete cuts; } char* get_name(void); void cross(CHROM*, CHROM*, CHROM*, CHROM*); protected: int *cuts; //array of cut positions int no_cuts; //number of cuts (n) int total_cross; //total crosses to perform int cur_cross; //crosses counter virtual void pick_cuts(void); //pick cut point(s) }; The cross method is the main procedure for implementing the crossover technique. The cross method determines whether to perform crossover based on the crossover probability. If crossover should be performed the pick_cuts method is called to randomly select the desired number of cut points. Then the cross method will pass this list of cut point(s) to the chromosome object to actually exchange the desired segments from the parents to the offspring. The class we created for uniform crossover needed a similar cross method but we needed to pick the cut points differently. So we created a new class derived from the n_pt class and overloaded the pick_cuts method. We also needed to overload the get_name method so that it would return a name for the uniform crossover method. The following shows the class definition and implementation of the uniform_xo class (located in the unixo.h and unixo.C files). class uniform_xo : public n_pt { public: uniform_xo(void) : n_pt(LCHROM-1){} char* get_name(void); protected: void pick_cuts(void); }; char* uniform_xo::get_name(void){ return( "Uniform Crossover" ); } void uniform_xo::pick_cuts(void){ int i; int cur_toss, prev_toss; prev_toss = flip(SWAP_PROB); no_cuts = 0; for( i=1; i < LCHROM; i++ ){ cur_toss = flip(SWAP_PROB); if(cur_toss != prev_toss){ cuts[no_cuts] = i; no_cuts++; prev_toss = cur_toss; } } } The constructor calls the n_pt constructor with the maximum number of cut points allowed so that the data structures will be initialized correctly. The get_name method just returns the name of the technique. And the pick_cuts method determines whether each position on the chromosome string should be swapped, according to the swapping probability. If the previous position should be swapped and the current position should not (and vice versa) then this is noted as a cut point. After the pick_cuts methods develops of the list cut points the cross procedure (inherited from n_pt) will call a chromosome method to actually exchange the segments between the parents and the offspring. 2. Update the Appropriate #define Identifier After you create the program code for your new method, you will need to update the file that contains the appropriate #define identifier. The Description of #define Directives section above describes the purpose of each identifier and the file where it is located. You will need to add a line to the file that contains the appropriate identifier for the type of component you are adding. A little later we will get more specific about the necessary modification, but first you need to decide which #define identifier is appropriate for your new component. Returning to the uniform crossover example we have two possible #define identifiers the might apply: CROSS_TYPE and CROSS. CROSS_TYPE is used to specify which class the one_pt and two_pt classes will be derived from. The CROSS identifier specifies which class is used for the crossover phase. Since we have already specified that the uniform_xo class will be derived from the n_pt class the CROSS_TYPE identifier is not applicable in this case. Moreover, the uniform_xo class implements all the public interface features of a technique for the crossover phase and thus it should be under the CROSS identifier category. The following shows part of the def_phase.h file, that contains the CROSS #define identifier, and discusses the modifications that are require to update this file. // -- crossover //#define CROSS uniform_xo #define CROSS one_pt //#define CROSS two_pt // -- select // ****** gen style //#define SELECT sus_gen //#define SELECT rank_sel #define SELECT rank_sus_gen // ****** individual style //#define SELECT sus_ind //#define SELECT rank_sus_ind In the file that contains the appropriate #define identifier you will need to add a line containing the #define preprocessor directive with the identifier name and the class name for your new component. For example, "//#define CROSS uniform_xo". The new line must be preceded by the comment symbol "//" (the new style of C++ comments). This is necessary because the user interface program changes which components are used by commenting out the #define directive of the active component and then removing the comment symbol ("//") from the #define directive for the component you selected. As you can see from the section of the def_phase.h file shown above, only one definition for each identifier will be active (i.e., not commented out) at any one time. There are several reasons for listing all the appropriate values for each identifier and keeping only one active by removing the comment symbol. First, this is a convenient method to document the legal values for each identifier. Second, this is an addition level of security to prevent inappropriate values from being assigned to a #define identifier since the system must find an exact match before it will remove the comment symbol. 3. Update Header Files The next step is to place an #include preprocessor directive for the header file of your new class in one of several composite header files. The composite header files are files the contain nothing but #include directives for other header files. The use of these composite header files simplifies the process of making the appropriate section of the program code aware of and be able to use your new component. So instead of having locate all the places in the source code that need access to your new header file, you only need to add an #include directive in one of the composite header files. And the composite header files are already included in the appropriate places in the existing source code. Which composite header file you need to modify will depend on type of component you are adding to the system. The following describes the type of components that should be included in the different composite header files. hdr_chr.h - this file includes header files for classes in the chromosome hierarchy. hdr_xo.h - this file includes header files for classes in the crossover hierarchy. hdr_gen.h - this file includes header files for components that can only be used with generational reproduction GAs and that were not included in the previous composite header files (i.e., hdr_chr.h or hdr_xo.h). hdr_ind.h - this file includes header files for components that can only be used in individual reproduction GAs and that were not included in the previous composite header files. hdr_both.h - this file includes header files for components that can be used with either generational or individual reproduction GAs and that were not included in the previous composite header files. For example, we added the following line to the hdr_xo.h file for the uniform crossover technique. #include "unixo.h" NOTE: You only need to add the #include directive for the header file of your new component to one of the listed composite header files. 4. Update makefile At this point you have made all the necessary modification to the source code. Now its time to update the makefile so the make utility will be able to compile your new component and link it with other modules to create an executable GA program. There are several modifications you will need to make to the makefile located in the Source sub-directory. First, near the top of the makefile there are several variables defined with a list of object files, you will need to add the object file for your new component to one of these variables. The BOTH variable contains object files that can be used with either generational or individuals reproduction GAs. The GEN variable contains object files for components that can only be used with generational reproduction GAs. And the IND variable contains object files for components that can only be used with individual reproduction GAs. The following shows the this portion of the makefile. # object files for components that can be used with either # generational or individual reproduction BOTH = main.o chrom.o eval_chr.o cross.o n_pt.o rank_sel.o reduced.o \ can_mute.o stat.o populat.o unixo.o # object files needed for Generational Reproduction GAs GEN = repl_gen.o sus_gen.o rk_sus_gen.o pop_gen.o # object files needed for Individual Reproduction GAs IND = killer.o repl_rank.o repl_sus.o sus_ind.o repl_rk_sus.o \ rk_sus_ind.o pop_ind.o The object file for uniform crossover (unixo.o) was added to the BOTH variable since it can be used with either type of GA. The second modification to the makefile is to add the dependency information for the object file of your component and add the commands for updating the object file if the dependents have been modified since the object file was created. The following shows the section of the makefile relating to the crossover object files. #----- Crossover Classes - implement crossover techniques # # cross.* implements 1 and 2 point crossover # n_pt.* implements Traditional crossover (where cut points # randomly selected). # reduced.* implements Reduced Surrogate crossover # unixo.* implements Uniform crossover cross.o : cross.C hdr_xo.h extern.h CC +i -c -g cross.C rm *..c n_pt.o : n_pt.C n_pt.h $(HDRS) CC +i -c -g n_pt.C rm *..c reduced.o : reduced.C reduced.h $(HDRS) CC +i -c -g reduced.C rm *..c unixo.o : unixo.C hdr_xo.h extern.h CC +i -c -g unixo.C rm *..c For example, the uniform crossover object file is depended on several header files and the source code file that implements the methods defined for this class. The lines below the dependency information are the commands necessary to update date the object file. And the final modification to the makefile is to add a new dependent for the composite header file where you added the #include directive for your component's header file in Step 3 above. The following shows the section of the makefile for the hdr_xo.h file where we added the #include directive for the header file of the uniform crossover technique. hdr_xo.h : n_pt.h reduced.h cross.h unixo.h touch hdr_xo.h Here we only need to modify the time and date of the file (by the touch command) to update the composite header file. This causes the object files that have this file as a dependent to be updated since the hdr_xo.h file is now more currently than the object files. 5. Create a Help File Now you need to create a help file for your new component. The information in this file will be displayed when the user selects your component and presses the F1 key while in the Select GA Components section of the user interface program. All of the help files are located in the Help sub-directory under the main directory where the GA Toolkit was installed. You will need to go to this directory to create the help file. In the Help directory there is a file called HELP.TEMP. This provides a template for creating your help file. To create your help file you will need to make a copy the HELP.TEMP file. Then you will need to edit your new file with a text editor and add the information about your new component. 6. Update Initialization File for the User Interface Program Now comes the fun part! For you to be able to choose your new component through the user interface program you need to update an initialization file the system reads about the available components. By placing this information in an input file, rather than hard-coding it, it is easier to update the user interface program when new components are added. This initialization file (called gat.ini in the Uif sub-directory) contains information about how to display the components and the constraints on which component can be used together. The first line of the file contains a number for the number of entries in the file. The user interface program uses an array data structure to store and manipulate the information about each component. Knowing the number of entries in the file makes it easier to create this data structure. You will need to increment this number for each entry you add to this file. The entries about each components contain several fields. Each field, with the exception of Constraints, must be delimited by commas. The following describes each field. An actual portion of the gat.ini file will be presented and explained later. Menu Category - this is the name to display on the menu of components (e.g., Chromosome or Selection). Sub-Menu - this indicates whether there is a sub-menu for the category and the name of the sub-category (e.g., Representation and Decoding are sub-categories for Chromosome). Use "None" in this field if there is no sub-category. Name of Component - this is name that will be displayed for the component (e.g., GENESIS Rank and Linear Rank components in the Selection category). #define Identifier - this is the #define identifier associated with the component (e.g., CROSS is the #define identifier for Uniform Crossover). Value for #define - this the value to associate with the #define identifier when the component is selected (e.g., we would give CROSS the value uniform_xo when Uniform Crossover is selected). Help file - this is the name of the file (in the Help sub-directory) that contains information about the component to displayed when the user presses the F1 key. Number of Constraints - this is the number of lines that follow describing the constraints about which components can be used with this technique. Constraints - each constraint is on a separate line following the entry for the component. Each constraint line must begin with a tab character. Constraints have the following format: identifier operator value The system currently understands the four operators described below. Examples of these operators will be discussed later. = This is the equal operator. It means that the current value associated with the #define identifier must be equal to the value specified in the constrain for this component to be compatible with the current configuration. != This is the not equal operator. It means that as long as the current value associated with the #define identifier is not equal to the value specified in the constraint then this component is compatible with the current configuration. > This is the implies operator. It means that if this component is selected and all the other constraints are satisfied then the component designated by the #define identifier and value of the constraint must also be selected. For example, if you are using SUS selection with individual reproduction, the repl_sus class must be used in the replacement phase. ?= This is the if operator. It means that if the identifier is currently associated with the value in the constraint then evaluate the constraints in the following lines preceded by two tab characters. The follow shows portions of the gat.ini file. The actual contents of the file are print in a Courier font and additional comments explaining the entries are printed in a Times Roman font. The first three entries in the file are for the chromosome category. The first entry is for the Representation sub-category. The name of the component is 1-D Bit String (the third field). When this component is selected the system will assign the REP_CHROM identifier the value of bit_chrom (the fourth and fifth fields). The bit_str.hlp file in the Help sub-directory contains additional information the user may access via the F1 key. And the last field contains a zero indicating that there are no constraints for this component. 24 Chromosome,Representation,1-D Bit String,REP_CHROM,bit_chrom,bit_str.hlp,0 Chromosome,Decoding,Binary,CODE_CHROM,bit_chrom,binary.hlp,0 Chromosome,Decoding,Gray,CODE_CHROM,gray_chrom,gray.hlp,0 The entry below is for the Individual Reproduction. This is probably the most complicated entry because of the various constraints. But this is a good example to explain how the constraints work. All of the constraints in this case use the if operator. The system will evaluate each constraint with the if operator and if the condition is true it will evaluate the constraints below it that are precede by two tab characters. So the first two constraints mean that when the user selects individual reproduction if we are currently using the generational version of SUS (sus_gen) or GENESIS Rank (rank_sus_gen), respectively, that we should switch to the individual reproduction version of the selection technique. The third constraint has two if conditions. It means the if Linear Rank selection (rank_sel) and the Generational Emulation replacement policy (kill_term) are currently selected the we need to change the replacement policy to the Remove Worst technique (kill_worst) since Generational Emulation is not compatible with Linear Rank selection. And the final constraint is a little confusing. It says the if Linear Rank selection is being used then we need to change the selection component to Linear Rank. Now this seem silly. But the implies constraint (>) actually causes the program to evaluate the constraints for the component we what to change. So the program will evaluate the constraints for Linear Rank selection. One of these constraints is that if individual reproduction is being used then we also need the repl_rank component for the replacement phase which inserts new offspring into the population it the proper order according to their rank. Reproduction,None,Individual,POP_TYPE,pop_ind,ind.hlp,9, SELECT ?= sus_gen SELECT > sus_ind SELECT ?= rank_sus_gen SELECT > rank_sus_ind SELECT ?= rank_sel KILLER ?= kill_term KILLER > kill_worst SELECT ?= rank_sel SELECT > rank_sel The entries below are most of the entries for the Selection category. The first entry is for the generational version of SUS selection. The only constraint on this component is that the style for reproduction must be generational (pop_gen). The last entry is for Linear Rank selection with individual reproduction. The constraint says we must be using individual reproduction for this component to be compatible. The second constraint says the we can use Linear Rank selection as long as we are not using the Generational Emulation replacement policy (kill_term). And the final constraint says the to use this component we must also use the repl_rank class for the replacement phase so that the offspring will be inserted in the proper position within the population, according to their rank. Selection,None,SUS,SELECT,sus_gen,sus.hlp,1, POP_TYPE = pop_gen Selection,None,SUS,SELECT,sus_ind,sus.hlp,2, POP_TYPE = pop_ind REPLACE > repl_sus Selection,None,Linear Rank,SELECT,rank_sel,lrank.hlp,1, POP_TYPE = pop_gen Selection,None,Linear Rank,SELECT,rank_sel,lrank.hlp,3, POP_TYPE = pop_ind KILLER != kill_term REPLACE > repl_rank The entries below are for the Crossover category. The first entry is the item we added for the Uniform crossover technique. In addition, we needed to add several constraints to the next two entries (Traditional and Reduced Surrogate crossover, respectively). These constraints mean the if we select either Traditional or Reduced Surrogate crossover when Uniform crossover is the component that we are currently using, then we need to change the CROSS identifier to one_pt. That is, it is invalid for the CROSS_TYPE identifier to be associated with either n_pt (Traditional) or reduced_sur (Reduced Surrogate) when the CROSS identifier is set to uniform_xo (Uniform crossover). Since the user has selected either Traditional or Reduced Surrogate crossover we need to change the CROSS identifier to either one_pt or two_pt and by default the system will use one_pt. Crossover,Style,Uniform XO,CROSS,uniform_xo,unif_xo.hlp,0 Crossover,Style,Traditional,CROSS_TYPE,n_pt,trad_xo.hlp,2, CROSS ?= uniform_xo CROSS > one_pt Crossover,Style,Reduced Surrogate,CROSS_TYPE,reduced_sur,surr_xo.hlp,2, CROSS ?= uniform_xo CROSS > one_pt Crossover,# Cut Points,One,CROSS,one_pt,none,0 Crossover,# Cut Points,Two,CROSS,two_pt,none,0 So to be able to select your new component through the user interface program you will need to add an entry to the gat.ini file. When you add your entry you should place it in the file with the other entries for that type of technique. For example if your created a new selection technique you should add the new entry in the area of the gat.ini file the has other entries for the selection category. In addition, you may need to modify the constraints for some the existing entries to ensure compatibility with your new component. And finally, do not forget to update the number on the first line of gat.ini file to accurately represent the number of component entries in the file. After you have modified the gat.ini file you can execute the user interface program and you should be allow to select your new component. There is no need to re-compile the user interface program since gat.ini is only an input file. {tc "Adding a New Parameter" \L 2}Adding a New Parameter This section describes how you can add a new input parameter for the GA programs that you create with the GA Toolkit. This is especially useful if you create a new component that needs a new parameter. For example, for the Uniform crossover technique that was recently added to the system needed a parameter to specify the probability for swapping each position on the chromosome. The following describes the process for adding a new parameter. 1. Declaring a Variable for the New Parameter The first step is to declare a new variable for your parameter. You will need to modify two files to declare your new variable and to provide the component access to your parameter. All user-defined input parameters for the GA Toolkit are declared as global variables so that all component will have access to the parameters they require. The global.h file in the Source sub-directory contains all the declaration of global variables. You will need to add a declaration for your new variable in this file. For example, we added the following for the swapping probability parameter. double SWAP_PROB; The second file you need to modify is extern.h (in the Source sub-directory). This file contains all the external declaration for the global variables. This is necessary so that all the components will have access to the proper variables. For example, we added the following external declaration for the swapping probability parameter. extern double SWAP_PROB; 2. Update the Input Section In this step you need to modify the input.h file (in the Source sub-directory). This file contains several #define directives used for reading the input parameters from the in file. The INPUT_FORM identifier is used for the fscanf format string. And the INPUT_VARS identifier is used as the list of input parameter variables in the fscanf call. So by updating the #define identifies in the input.h file the program will be able to read in your new parameter. The following shows a section of the input.h file where we updated the INPUT_FORM and INPUT_VARS #define identifiers. #define INPUT_FORM " \ Total Experiments = %d \ Total Trials = %d \ Population Size = %d \ DeJong Gap = %lf \ Elite Gap = %lf \ Prob. Cross = %lf \ Prob. Mutation = %lf \ Selection Bias = %lf \ Replacement Bias = %lf \ Maximum Spin = %d \ Statistics Interval = %d \ Hamming Metrics = %d \ Best Size = %d \ Worst Size = %d \ Maximize = %d \ Evaluate All = %d \ Seed Population = %d \ Debugging Level = %d \ Display = %d \ Print Generations = %d \ Random Seed = %lu \ Swapping Prob. = %lf" #define INPUT_VARS &MAX_EXPER, &MAX_TRIALS, &POP_SIZE, &GAP, \ &ELITE_GAP, &P_CROSS, &P_MUTE, &SELECT_BIAS, &REPLACE_BIAS, \ &MAX_SPIN, &STATS_GAP, &HAM_FLAG, &BEST_SIZE, &WORST_SIZE, &MAX_FLAG, \ &EVAL_ALL, &INIT_FLAG, &DEBUG_FLAG, &DISPLAY_FLAG, &PRINT_GEN, &SEED, \ &SWAP_PROB As you can see we just added our new "Swapping Prob." parameter to the end of the INPUT_FORM identifier and the variable SWAP_PROB to the end of the INPUT_VARS identifier. Note: The INPUT_VARS identifier is used as the list of argument to the fscanf function so the "&" in front of the variable name is necessary so the value will be actually be store in the variable after the fscanf function returns. 3. Update the Help File for Input Parameters The next step is to add a short description about your new parameter to the in.hlp file (in the Help sub-directory). This file contains information about all the input parameters. You should use a text editor to edit the in.hlp file and add some information about your new parameter. 4. Update the Parameter Initialization File for the User Interface Program The final step for adding a new parameter is to update an initialization file, called gat_in.ini in the Uif sub-directory, so that you can create input files with this new parameter via the user interface program. The gat_in.ini file contains information about the parameters so the user interface program can ensure that only appropriate values are entered for each parameter. The first line in the gat_in.ini file contains the number parameter entries in the file. The user interface program uses an array data structure for storing and manipulation the parameter information. The number on the first line of the file makes it easier to create this data structure. When you add information to this file for new parameters you will need to update this number to accurately reflect the number of entries in the file. Each remaining lines in the file contains information about one parameter. Each line or entry contains several comma delimited fields described below. The actual gat_in.ini file will be presented and explained later. Parameter Name - this is a name for the parameter. You must use exactly the same name you used in the INPUT_FORM identifier in the input.h file. For example, for the swapping probability parameter we used the name "Swapping Prob." so that is the value we must use for this field. Default Value - this the default value for the parameter that will initially be displayed in the user interface program where the user begins creating the in file. Type of Parameter - this field is used so that the validity of the user's input can be checked. The system currently understands the following entries for this field. 1 the value for this parameter must be greater than the value of the Minimum field. 2 the value for this parameter must be greater than or equal to the value of the Minimum field. 3 the value for this parameter must be an even integer. 4 the value for this parameter must be in the range [Minimum field, Maximum field]. 5 the value for this parameter must be in the range (Minimum field, Maximum field]. 6 the value for this parameter can be 0 or 1. This is for boolean flags. A value of 0 is displayed as "no" and a 1 is displayed as "yes" in the user interface program. The default value for this field must be 1 or 0 (not yes or no). Minimum - this field is only used with parameters of type 1, 2, 4 or 5. Maximum - this field is only used with parameters of type 4 or 5. The following shows the actual gat_in.ini file and explains several of the entries. The contents of the file is printed in a Courier font and additional comments are printed in a Times Roman font. 22 Total Experiments,1,1,0 The Total Trials parameter has a default value of 1000. This is a type 1 parameter and the Minimum field is 0 which means that the value for this parameter must be greater than 0. Total Trials,1000,1,0 Population Size,50,3 DeJong Gap,1.0,4,0.0,1.0 Elite Gap,1.0,4,0.0,1.0 The Prob. Cross entry is for the cross probability parameter. It has a default value of 1.0. This is a type 4 parameter with a minimum of 0.0 and maximum of 1.0. So the values for this parameter must be in the following range [0.0, 1.0]. Prob. Cross,1.0,4,0.0,1.0 Prob. Mutation,0.001,4,0.0,1.0 Selection Bias,1.5,5,1.0,2.0 Replacement Bias,1.5,5,1.0,2.0 Maximum Spin,3,1,0 Statistics Interval,200,1,0 Hamming Metrics,0,6 Best Size,1,2,0 Worst Size,3,2,0 The Maximize parameter is boolean flag (type 6) parameter for whether the GA will be applied to a maximization or minimization problem. The default value is 1 (the flag is on) indicating a maximization problem. Maximize,1,6 Evaluate All,0,6 Seed Population,0,6 Debugging Level,0,2,0 Display,0,6 Print Generations,0,6 Random Seed,123456789,2,0 The Swapping Prob. entry is what we added for the swapping probability parameter for the Uniform crossover technique. The default value is 0.5 or on average half the positions on the chromosome will be swapped. This a type 4 parameter with a minimum of 0.0 and a maximum of 1.0. So this parameter can be assigned values in the following range [0.0, 1.0]. Swapping Prob.,0.5,4,0.0,1.0 When you add an entry for your new parameter it must be in the same relative position as you added the information in the input.h file. For example, we added the Swapping Prob. parameter to the end of the INPUT_FORM and INPUT_VARS #define identifiers in the input.h file so we must also add our entry for this parameter to the end of the list in the gat_in.ini file. You can arrange the parameters in any order you like also long as you maintain the same order in the gat_in.ini file and in the INPUT_FORM #define identifier (with corresponding variables in the INPUT_VARS #define identifier). After you have completed the necessary modification to the gat_in.ini file you can execute the user interface program and create an in input file with your new parameter. {tc "REFERENCES" \l 1}References Ackley, D. H. A Connectionist Machine for Genetic Hillclimbing. Boston, MA: Kluwer Academic Publishers, 1987. Baker, J. E. "Adaptive Selection Methods for Genetic Algorithms." In Proceedings of the First International Conference on Genetic Algorithms and Their Applications, ed. J. J. Grefenstette, 101-111. Hillsdale, NJ: Lawrence Erlbaum Associates, 1985. Baker, J. E. "Reducing Bias and Inefficiency in the Selection Algorithm." In Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms, ed. J. J. Grefenstette, 14-21. Hillsdale, NJ: Lawrence Erlbaum Associates, 1987. Booker, L. "Improving Search in Genetic Algorithms." In Genetic Algorithms and Simulated Annealing, ed. Lawrence Davis, 61-73. Los Altos, CA: Morgan Kaufmann Publishers, Inc., 1987. DeJong, K. A. "An Analysis of the Behavior of a Class of Genetic Adaptive Systems." Ph.D. dissertation, University of Michigan, 1975. Frederick, W., R. Sedlmeyer and C. White. "The Hamming Metric in Genetic Algorithms and Its Application to Two Network Problems." Computer Science Department, Indiana University-Purdue University at Fort Wayne, (unpublished manuscript), 1991. Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley Publishing Company, Inc., 1989. Grefenstette, J. J. GENESIS Software (Version 5.0). Distributed by The Software Partnership, Melrose, MA, 1990. Holland, J. H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press, 1975. Lienau, S. A. Effective Investigation of Genetic Algorithms. Master's Thesis, University of Missouri-Kansas City, 1992. Syswerda, G. "Uniform Crossover in Genetic Algorithms." In Proceedings of the Third International Conference on Genetic Algorithms, ed. J. D. Schaffer, 2-9. San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1989. Whitley, D. "The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best." In Proceedings of the Third International Conference on Genetic Algorithms, ed. J. D. Schaffer, 116-121. San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1989. Whitley, D. "Fundamental Principles of Deception in Genetic Search." In Foundations of Genetic Algorithms, ed. G. J. E. Rawlins, 221-241. San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1991. {PAGE|33} {PAGE|i} {PAGE|55} {PAGE|57} {PAGE|56}